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Abstract 



This paper is concerned with the evolution of haploid organisms that reproduce asexually. In a sem- 
inal piece of work, Eigen and coauthors proposed the quasispecies model in an attempt to understand 
r such an evolutionary process. Their work has impacted antiviral treatment and vaccine design strate- 

O^ gies. Yet, predictions of the quasispecies model are at best viewed as a guideline, primarily because it 

assumes an infinite population size, whereas realistic population sizes can be quite small. In this paper 
,— I we consider a population genetics-based model aimed at understanding the evolution of such organisms 

K* with finite population sizes and present a rigorous study of the convergence and computational issues 

t ■ that arise therein. Our first result is structural and shows that, at any time during the evolution, as the 

zj~: population size tends to infinity, the distribution of genomes predicted by our model converges to that 

,__! predicted by the quasispecies model. This justifies the continued use of the quasispecies model to derive 

guidelines for intervention. While the stationary state in the quasispecies model is readily obtained, due 
to the explosion of the state space in our model, exact computations are prohibitive. Our second set 
^^ of results are computational in nature and address this issue. We derive conditions on the parameters 

T—i of evolution under which our stochastic model mixes rapidly. Further, for a class of widely used fit- 

Kjl ness landscapes we give a fast deterministic algorithm which computes the stationary distribution of our 

• ^H model. These computational tools are expected to serve as a framework for the modeling of strategies 

^^ for the deployment of mutagenic drugs. 



Topics: Molecular Evolution, Quasispecies Theory. 
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1 Introduction 

The rapid genomic evolution of viruses such as HIV has made the design of drugs and vaccines with lasting 
activity one of the most difficult challenges of our time. A novel intervention strategy which potentially 
outplays viruses in this evolutionary game was suggested by the pioneering work of Eigen and coworkers 
| Eig71||EMS89l . Eigen and coworkers considered the asexual evolution of a haploid organism and found 



that when the mutation (or evolutionary) rate was small, the organism survived as a collection of closely 
related yet distinct genomes together termed the quasispecies. Viral populations in infected individuals are 
known to exist as such quasispecies MLAlOi Remarkably, this quasispecies model predicted that when the 
mutation rate increased beyond a critical value, called the error threshold, the collection of genomes in the 
quasispecies ceased to be closely related; in fact, a completely random collection of genomic sequences 
was predicted to emerge. This transition with increasing mutation rate thus induced a severe loss of genetic 
information in the quasispecies and has been referred to as an error catastrophe. The generic antiviral drug 
ribavirin has been shown to act as a mutagen-an agent that induces mutations-against poliovirus and trigger 
a severe loss of viral infectivity in culture ICC API. I . This strategy of enhancing the viral mutation rate thus 
appears promising and particularly advantageous because it is unlikely to be susceptible to failure through 
viral evolution-driven development of drug resistance. Mutagenic drugs that attempt to induce an error 
catastrophe are thus being explored as a potential antiviral strategy BCCAOU lGPLL+051 IADL041 , and one 
such drug for HIV is currently under clinical trials [jMHH+llt . 

The success of mutagenic strategies relies on accurate estimates of the error threshold of the pathogens 
under consideration. Notwithstanding the tremendous insights into viral evolution the quasispecies model 
provides, important gaps remain between the quasispecies model and the realistic evolution of viruses and 
other haploid asexual organisms. First, whereas the model assumes an infinite population size and, hence, 
adopts a deterministic approach, real populations are often small enough to lend themselves to substantial 
stochastic effects. For instance, the effective population size of HIV-1 in infected individuals is about 10^ — 
10^ r KAB061 IBSSDll l. which is thought to underlie the strongly stochastic nature of HIV-1 evolution. 
Second, the model assumes a single-peak fitness landscape, where one genomic sequence is assumed to 
be the fittest and all other genomes are equally less fit. Realistic fitness landscapes are far more complex 
llHMC'^llt . There have been significant efforts in the last 30 years to close these gaps HWilOSII . While 
more general landscapes have been successfully considered in the quasispecies case I SH06I . a rigorous 
treatment of the finite population case has remained elusive (see Section [4]). Importantly, it still remains to 
be established whether the insights gained from the quasispecies model, such as the occurrence of an error 
catastrophe, translate to the more realistic, finite population case. 

Here, we consider a finite population model of the asexual evolution of a haploid organism. Following 
standard population genetics-based descriptions [HC06|, the model considers evolution in discrete, non- 
overlapping generations. Within each generation, genomes undergo reproduction (R), selection (S), and 
mutation (M), yielding progeny genomes for the next generation. We analyze this RSM model formally and 
establish the following key results. We show that in the infinite population limit, the expected structure of 
the quasispecies predicted by the RSM model converges to that of the quasispecies model. Thus, insights 
from the quasispecies model may be translated to the finite population scenario. Indeed, we show further 
that the error threshold predicted by the RSM model also converges in the infinite population limit to that 
of the quasispecies model. Finite population models, such as the RSM model, appropriately tuned to mimic 
specific details, such as the fitness landscape, of the pathogens under consideration may thus be employed 
to obtain realistic estimates of the error threshold. 

Unlike the quasispecies model, where the quasispecies is identified readily using black-box eigenvector 



determination algorithms, identifying the expected quasispecies of the RSM model is computationally pro- 
hibitive even for the smallest realistic genome and population sizes. Monte Carlo sampling techniques are 
therefore often resorted to [|Wil05[|AB051lBKP+llllGD10[|TBVD 12l. Here, going beyond the ideas of the 
quasispecies model, we examine the mixing properties of the RSM model. We establish constraints on the 
model parameters under which the RSM model exhibits rapid mixing and therefore allows fast estimation 
of the expected quasispecies structure. Finally, we suggest an algorithm that uses the Markov Chain Monte 
Carlo paradigm to estimate the error threshold in the RSM model. Our study thus serves as a framework for 
elucidating quantitative guidelines for the modeling of intervention strategies that employ mutagenic drugs. 
The paper is organized as follows. In Section|2]we briefly describe the quasispecies model and the notion 
of the error threshold. In Section|3]we setup the finite population RSM model, present our main results and 
outline the techniques employed. In Section |4] we discuss our results in the context of previous studies and 
highlight open problems arising from work. Formal statements of our results are presented in Section [5] 
Detailed proofs are contained in the Appendix. 

2 Preliminaries and Definitions 

2.1 Preliminaries 

We consider the evolution of a population of haploid organisms that reproduce asexually. In this evolution- 
ary process the genome of each individual is modeled as a string of nucleotides. During reproduction, the 
genome is copied with possible mutations, which can be insertions, deletions, or substitutions. In appli- 
cations, such as the modeling of viral evolution, it is often convenient to neglect insertions, deletions, and 
substitutions other than transitions (A to G or C to T , and vice versa). Under this assumption, a genome 
may, without loss of generality, be represented as a binary string. We thus represent a genome as an L-bit 
string a = (ai , 02, . . . , Ol), where a, G {0, 1}. 

The fitness of a genome is then modeled in terms of its propensity to produce copies of itself. Specif- 
ically, the fitness of the genome a is defined by the number of copies Ua of itself that it produces in one 
round of replication (also called one generation). However, during replication, each bit of each of the aa 
offsprings is copied incorrectly with probability /i (called the error or mutation rate), thus potentially giving 
rise to an L-bit string different from a. The fittest genome, also termed the master sequence, is without loss 
of generality assumed to be = (0, . . . , 0) so that ao > a^ for all a 7^ 0. 

The primary cause of the complexity and diversity in the evolution of such organisms is the variety 
of possible fitness landscapes, which a priori can be arbitrary functions from {0, 1}^ to the set of non- 
negative integers. Several special classes of fitness landscapes have been employed in the literature and 
we list the important ones below. We will assume that ac > l- The case ac = for some a's has been 
used IIWK93[lTH07llGD10L and will be discussed in Section m 

1. {General) Here the only condition is that a(y>\. 

2. (Class Invariant l\SH06\ \TH07\ IPMnDlOl \BSSDll\l ) In a class invariant landscape aa depends only 
on the Hamming weight of a. 



3. [Single Peak l\Eig71\ WS89\ \PMnD10\l ) Here, we have ao > 1 and a^ = 1 for all a / 0. 

4. (Multiplicative l\TH07\ \WH96\I ) These are parametrized by «i , . . . , a/, > 1 so that for a given a, aa 

Y[i=i,ai=0^i- 



def 



Other landscapes such as the simpler additive or linear landscapes and more complex correlated landscapes 
have also been employed in the literature IIBS931 IWie97l lvNCM99]| . 

2.2 The Quasispecies Model 

Eigen and coworkers fEig71[|EMS891 gave the following differential equations for the time-evolution of the 



fractional population of the genome a at time t, denoted by X(j{t) 

dXait) 



dt ^ 



x^{t)a^Q-ca -Xcj{t)A{t) for all a. 



The collection of 
utionary process, 



Here, Qat = /^'^"('^ '^)(1 — /i)^ dnias) jg jj^g probabihty that a mutates to T and dH{o,x) is the Hamming 

- def 

distance between o and T. A{t) is the average fitness ^^^'^■^^(O ^^ time t. Defining A^z = cia when a = T 

def 

and otherwise, they showed that the vector of stationary frequencies, v'^ = lim,^oo;ccj(f), is the dominant 

right eigenvector of the value matrix QA at mutation rate /i such that || v^ || i = Y.a ^'u = 1 ^ 

genomes determined by this dominant eigenvector, which marks the culmination of the evo 

is called the quasispecies. It is important to note that the vector v^ is independent of the starting population 

distribution. 

We will mostly be concerned with the discrete time version of the quasispecies model. In the discrete 
time case, f = {0, 1, . . .}, denoting the fraction of genomes of type a at time t by mf , Eigen's equations can 
be written as: 

a def'LTinJarQra 
Lxmjar 

In vector notation, given the fractional population m, at time t, the fractional population m,+i at time f + 1 
is given by ni,+i = r(inf ), where the a co-ordinate r^ of the vector valued function r is defined as 

Again, m, can be shown to converge to v^ irrespective of the starting population distribution as t goes to 
infinity. However, at any finite t, m^ depends on the initial state mo. 

2.3 The Error Threshold 

With the single peak landscape, Eigen et al. observed empirically that there is a critical value pL^ < 0.5 
for the mutation rate /i such that for pi <^ pLc, the quasispecies is dominated by the master sequence, i.e. 
Vu > v'^ Va, whereas when 0.5 > jJ. > He the quasispecies is dispersed approximately uniformly. The 
critical mutation rate He is called the error threshold because the uniform dispersal for jj. > He implies a 
severe loss of representation in the quasispecies of the genetic information encoded by the master sequence. 

Evidently, this dispersal also decreases the mean fitness, A = Y.a'^av'^- 

We note here that despite the notion of the error threshold being widely recognized, no consensus ex- 
ists on its definition; see Wilke IIWil05ll . Since, in most cases, v^ will never become exactly the uniform 
distribution on {0, 1}^, it is clear that the goal is to find the smallest /i such that v^ is close to the uniform 
distribution on all genomes, i.e., the vector with every coordinate equal to 2^^, which we denote by U. To 



'Throughout this paper, we will be dealing with vectors over {0, 1}^. Vectors will be typeset in boldface. The components of a 
vector X will be denoted either by x'^ or by x^ for CT e {0, 1 }^, based on convenience of notation in the context of use. 



define the error threshold we also need a function that measures closeness: e.g. ||v^ — U||i, ||v^ — U||t,o or 
the difference in Shannon entropies of v^ and U, namely |H(v;i) — L\ . Hence, for a given distance function 
d which measures closeness of v^ and U, and a bound on closeness £ > 0, we define 

^ii{e) = min{jU G (0, 1) : ^(v^,U) < e}. 

First note that at /x = 1/2, the steady state vector v^ is exactly U. Hence, /i^(e) < V^ for ^H d,e > 0. 
Second, note that changing the distance function d will change the error threshold quantitatively. Eigen 
and coworkers presented a heuristic argument that the error threshold should scale as i/^ for the single-peak 
model without any rigorous proofs of its existence and without mentioning any closeness function. 

3 A Finite Population Model and Our Main Results 

In this section, we describe at an informal level the salient features of our work, which comprises a finite 
population model to capture molecular evolution, and our theoretical and computational results associated 



with it. We give a high-level technical overview of the methods used to prove our results in Section 3.3 
while precise definitions and formal statements of our results appear in Section |5] Proofs have been moved 
to the Appendix due to considerations of space. 

3.1 A Finite Population (RSM) Model 

We consider the following stochastic discrete time finite population model of evolution which we call the 
RSM model. The parameters are the same as in the quasispecies model: the genome length L, the sequence 
space {0, 1}^, a per bit mutation rate /x and the fitness landscape {aa} ae{Q,iY ^^^^ ^11 fl'tj > 1 and integers. 
At any time t, let Nf be the number of genomes (a random variable) of type a, and fix the total population 
Y,o ^? to be A^. This fixing of the population size to N at each step is the key distinction from the quasispecies 
model and is a new parameter. In each time step, the ensuing evolution is then described in terms of the 
following three steps. 

1. (Reproduction) First, in the reproduction step, each genome o produces flfj copies of itself, giving 

def def 

rise to an intermediate population It = LcTe{o,i}^^!^» where 1^ = OaNf . 

2. (Selection) Second, in the selection step, A'^ genomes are chosen at random without replacement from 
this intermediate population of size /,, resulting in the selection of Sf genomes of type a where 
5f G {0, 1, . . . ,/f } and I^e{o,i}^ Sf=N< h. 

3. (Mutation) Third, in the mutation step, each selected genome is mutated with probability }JL per bit, 
giving rise to the next generation of N^^^^ genomes of type a, such that L(7e{o,i}^^f+i =^- 

The starting state is denoted by No which is typically ggiven hy Nq = N and N^ = for all a 7^ 0, but we 
will often not use this assumption. This RSM model is best viewed as a Markov chain where the state space 
is the set of functions / : {0, 1}^ 1— ;■ {0, 1 , . . . ,A'^} such that I^fj/(cj) = A'^. T hus, the number of states of this 
Markov chain is ( ^^ ) which is roughly N^ . It can be shown (see Fact 5.1 1 that for any < /i < 1 the 
transition matrix of this Markov chain has a unique stationary vector, denoted by tt. tt is indexed by all / 
satisfying the property above and ^y 7r(/) = 1 , i.e., tt is a probability distribution over the state space of the 
RSM Markov chain. 



def 

Let D, = [N^^ /N)^i^rQnL denote the random vector which captures the fractional population at time t. 
It can also be shown that lim,^ooIE [Dj |Do] exists and is independent of Dq. We denote this limit as E [D„o] 
and it can be computed from the stationary vector n as follows: 

Finally, we will subscript D^ with parameters such as IJ.,N when we want to highlight dependence on them, 
e.g. T)t,ii.N- The key questions of interest, especially given the fact that computing n would be prohibitive 
even for small values of L and N, are: 

1. For a fixed t, what does E [Dj |Do] converge to as N increases? 

2. Is there a notion of error threshold in the RSM model? 

3. How to obtain an estimate of E [D„] efficiently? 

We present theoretical results that address all of the above questions. Note that if the answer to the first 
question is that E [D^jDo] converges to the prediction of the quasispecies model m, with the same starting 
states (mo = Dq), then it is important as we can leverage the significant understanding obtained from the 
study of the quasispecies model while incorporating stochastic effects with finite populations. For the second 
question, we need to first define a notion of the error threshold in the RSM model. We do so formally, given 
a distance function d. 

Definition 3.1. Let e >0. 

^^ie,N) = minJAi G (0, 1) : J(E [B^^^im] ,U) < e} , 
where U is the uniform distribution over all genomes of length L. 

What we want to understand is \mij^^oo^'^{£,N). Again, if we can show that the answer here is jJ-d^), then 
we can translate the insights from the quasispecies model to the RSM model. 

For the third question, first note that if we want to estimate the error threshold, we need to be able 
to compute E [Deo] . Secondly, we consider the standard notion of efficiency: the algorithm to estimate 
E [Deo] should be polynomial in the input size. As we noted, the state space of the RSM Markov chain is 
prohibitively large and computing the stationary state is prohibitive. We employ the Markov Chain Monte 
Carlo method and run the RSM process for some time T such that it is guaranteed that distribution from 
which Df is drawn comes statistically close to the distribution from which Doo is drawn, irrespective of 
Dq. Simulating each step of the random walk can be done efficiently. Hence, we are led to the question of 
bounding the mixing time of the RSM Markov chain: the smallest time the finite time distribution needs to 
come close to the steady state distribution for all starting configurations. 

The issue of how the input is presented is also important and we briefly discuss it here. In one model, 
one can be given all a^ which would require bit length about Lcj logoff and can, in principle, be as large 
or even larger than 2^. Often, this is not the case and either the values a^ are given by a simple equation, 
or only some fixed number, say k <^2^ of the values a a are strictly bigger than 1. In the latter case the 
input has bit length roughly 0(^logmaX(ja(j +logL + logi//i). Another case for the input is when aa are 
class invariant. In this case the input is of length roughly 0(LlogmaXcjaCT + logi/^). We now proceed to 
summarize our results. 



3.2 Our Results 

We now give informal statements of our main results, before describing the mathematical techniques em- 
ployed in the proofs of these results. The formal statements of the theorems described here appear in Section 
[5j after a discussion in Section [4j while the formal proofs are deferred to the Appendix. 

3.2.1 Convergence of the Quasispecies and the RSM Model 

Theorem 3.2 (Convergence of the RSM and Quasispecies Models). Fix a fitness landscape A with positive 
entries and a mutation transition matrix Q. Consider the RSM process started with the initial state Dq and 
consider the evolution of the quasispecies model started with the initial state mo = Dq. Then for any fixed 
time t, 

limE[Df|Do]=mf, (3) 

where m, is the fractional population vector at time t starting from mo predicted by the quasispecies model. 

The theorem shows that in the infinite population limit, the stochastic fluctuations of the RSM process 
disappear, and the model converges to the quasispecies model. Informally, the main technical difficulty in 
proving the above theorem is to establish a result of the form limAr^oo]E[Df |D,_i] = limA^^ooIE [D,|Do] with 
probability 1, which would establish convergence to the quasispecies model. The full proof is deferred 
to the Appendix. This convergence result allows us to show that for any distance function d, a finitary 
version of the error threshold, iJ.^{e,N) as defined above, converges to the error threshold nf{£) of the 
quasispecies model, as the population size goes to infinity. These two results provide validation for the finite 
population RSM model by establishing that in the infinite population limit, the predictions from the RSM 
model converge to those of the quasispecies model. We now move on to problems concerning the mixing 
time and other computational issues of the RSM model. 

3.2.2 Computational Results in the RSM Model 

As noted before, a primary computational question in both the quasispecies model and the RSM model is 
the determination of the quasispecies, or the expected population profile at stationarity in the RSM model. 



which can then be used to estimate error thresholds (see Section |3.3| for an overview and Sections 5.3.2 



and 5.3.3 for details). For the quasispecies model, a satisfactory solution to this problem is obtained via 
the observation that the quasispecies is the leading right eigenvector of the QA matrix. The QA matrix is of 
dimension 2^ x 2^, and the above observation can thus be used to obtain efficient algorithms using black-box 
eigenvector finding algorithms for moderate values of L. In the case of class-invariant fitness landscapes, it 
is known IISS82II that one only needs to find the leading eigenvector of an (L + 1 ) x (L + 1 ) matrix. 

However, similar approaches are not as effective for the RSM model. In this case, the stationary dis- 
tribution is the leading eigenvector of the transition matrix ..# of the RSM process which is of dimension 
roughly N'^ . Using ideas similar to those referred to above, one can reduce the running time for computing 
the stationary distribution to N^^^~\ 

Theorem 3.3 (Computation of Steady State in the Class Invariant Case). For any class invariant fitness 
landscape A, there is an algorithm running in time T = 0{N^^^ ') which computes the steady state of the 
RSM process with population size N and the genome length L. 



However, as mentioned before, in many applications, as in the case of HIV, for instance, where N ~ 
10^ — 10^ and L ~ 10"*, the problem is still computationally prohibitive. In these cases, one typically re- 
sorts to Monte Carlo simulations of the RSM process for estimating the population profile at stationar- 
ity IITBVDI2II . and thus we are led to considering the mixing time of the RSM process. The following 
theorem derives conditions on the parameters of evolution under which the RSM model mixes rapidly. 



Theorem 3.4 (Mixing Time of the RSM Process). Given a fitness landscape A, mutation rate }X, the RSM 
process exhibits fast mixing if^l—ljj.) ^^^L + ^ < 1 . 

Having stated our results, we now highlight the techniques employed in the proofs. 



3.3 Overview of Our Technical Contributions 

As before, we will denote by D, the random variable of fractional populations after t steps of the RSM 
process, and by S^ the random variable of the populations of genomes after the replication and selection 
steps of the (? + l)-the step of the RSM process. 



Our convergence result (Theorem 3.2 1. The starting point of the proof of our convergence result is to 
observe that E [D^+i |D,] has the same functional form r (as a function of D;) as the evolution equation of 
the discrete time quasispecies model, with r as defined in Equation ([2]): E[D;+i|D,] = r(D(). Our high 
level approach is to first show that Df+i is actually concentrated around E[D,+i|Df] . Using the Lipschitz 
continuity of the evolution function r, we can then chain these concentration results inductively to show 
that the evolution of D, is tightly concentrated around the evolution of the discrete time quasispecies model, 
which allows us to show that E [D,] converges to the quasispecies as A^ — )• oo. To illustrate the ideas involved, 
we consider the case L = 1. Here the two genomes are {0, 1}. After the replication phase in the t-th step, 
there are a^D^N copies of 0. For the /-th copy, let /?, denote the indicator variable for this copy being selected 

in the selection phase, so that S^ = L;!li' Rt- Since the /?;'s are not independent, we cannot directly apply 
a Chernoff bound. However, since they are negatively correlated, one expects concentration to hold, and 
this can indeed be shown using the so-called method of bounded differences. The same reasoning works 
for S], and thus we get that given D,, the intermediate population St after the replication and selection steps 
is concentrated around its expectation with high probability. We now look at the mutation step. Let M, be 
the indicator variable for the /th genome being after the mutation step. We then have A^O'*^! = L^iM- 
Since the M,'s are independent random variables, it can be shown using a Chernoff bound that given S?, 
DJ'^j is concentrated around E [D°^j |S,] = ^/n{ijlS^ + (1 ~ P^)^})- The two steps can then can be combined 
to show that given Df, D°_^j is concentrated around E [E [D°^i|St] |D,] = i/a'IE [/xSf + (1 - Ai)5/|D(] = 

^ J^n '^^ ' ■ The same reasoning works for D},, . 
With some more work, this argument can be generalized to work for arbitrary L. The concentration 
guarantee we obtain is of the following form: there are quantities £, and pt which are both o/v(l) such that 
given D,, |Df+i — E[Df+i|D,]| < £, with probability at least 1 — pt- In the next step, we chain these step- 
wise bounds inductively in order to remove the conditioning and show that for all t < to, D, is concentrated 
around mj. An important component of the induction is the observation that r is Lipschitz continuous, 
which allows us to control the propagation of the errors e/ in each step. By the induction hypothesis, we 
have that |D, — m,| < e/ with probability at least I — p'f, where £/ and /jj are both on {I)- Assuming the 
Lipschitz constant of r to be K, this implies that E [D,+i |Df] = r(D,) is within distance KSf of m,+i = r(m,) 
with probability at least 1 — p'^. Applying the convergence result from the first step, we then have that with 
probability at least 1— pj^j = \—p',—pt, \^t+\ — ni;+i| < £/+i =^e/ + ef of m,+i. The quantities /j^ e/ for 



? < ?o can be chosen to be OAr(l), which is sufficient to show the required convergence. The details appear 



in Appendix B.2 We now give an overview of the proofs of our computational results. 



Computing the stationary distribution in the class invariant case (Theorem |3.3| l. Recall that the state 
space of the RSM Markov chain is roughly N'^ . However, if the fitness function is class invariant, we can 
show that the number of distinct coordinates in the state space is about A^^. To see this, first we define an 
equivalence relation on the states of the RSM Markov chain. We say that f,g, which are functions from 
{0, 1}^ to {0, 1, . . . ,A'^} satisfying Y.afi'^) = ^ ^i^d £(j^(a) = A'^, are equivalent (denoted f = g) H they 
have the same statistics for every Hamming class, i.e., for every <i <N, 

{c7G{0.1}'-|vv«(CT)=r} {ae{0,l}'-\wH(a)=i} 

Thus, the state space of the RSM Markov chain gets partitioned into about (A'^+ 1)^+' different classes. 
Then, due to the fact that the fitness function is class invariant, it can be shown that the transition probability 
of / to any other equivalence class is the same as that of g to the same class. Hence, one only needs 
to compute the transition probability from one equivalence class to another. This probability is a large 
binomial sum and one has to be careful in its computation and keep track of the number of bits required to 
represent each entry of this Markov chain over the equivalence classes. Once we have the transition matrix 
of this Markov chain, one can compute its largest eigenvector which corresponds to the stationary state. We 
show that, if one does this carefully, one can compute the eigenvector in time roughly N'^^^'\ The details 



appear in Section B.5 



Algorithm to compute the error threshold. Once we have the ability to either compute the stationary 
state of the RSM process or derive independent samples from its stationary state (which allows us to estimate 
the relative frequencies of the genomes at stationarity with a good precision by taking an average of the 
sampled states), the algorithm to estimate the error threshold is simple. The idea is to start with a small value 
(^ 1/l) of n , and to estimate/compute the stationary distribution of the RSM process for the current value of 
/I . The algorithm then checks if the estimate of the stationary distribution is close to the uniform distribution 
on the genomes in the measure of closeness of one's choice. If so, it stops and outputs the current value of 
/I as an estimate of the error threshold. Else, it increases /x by a very small amount and repeats the above 
steps. In case direct computation of the stationary state of the RSM process is computationally prohibitive, 
independent samples from the stationary distribution of the RSM process are derived by simulating the RSM 
process up to its mixing time. The number of samples required can be estimated from a simple application 
of Chernoff bound on the random variable corresponding to the stationary state distribution of the RSM 
process. Hence, to establish bonds on the running time of the error threshold estimating algorithm, it is 
important to be able to bound the time it takes for the RSM process so that Df comes close to the stationary 
state, Doo. Our next result is towards this. 

Mixing time result (Theorem |3.4| l. Since the stationary distribution of the RSM chain is not very well 
understood, it is not clear how to apply conductance-based geometric tools or the canonical paths method 
(see, for example, IIJS88II ) in order to prove the mixing time result. We are thus led to more combinatorial 
coupling based methods. Here one starts with an integer valued metric d on the state space of the Markov 
chain, and then one runs two copies Xt and F, of the chain. To show fast mixing, it is then sufficient to prove 
thatX, and 7? can be coupled so that E[d{Xt+i ,Yt^i)\{Xt ,Yf)] < a < 1. In general, defining a coupling can 



be tricky because one needs to carefully argue that the marginals of the coupling agree with the original 
Markov chain. 

We define the coupling in two phases: the first phase includes the replication and selection steps and 
the second phase includes only the mutation step. We begin with the easier mutation step. Let It and 7, 
denote the state of the two RSM chains after the replication and selection steps. For most natural choices 
of the distance metric d, it is possible to couple the mutation step using the standard coupling for the 
random walk on the hypercube so that 'E[d{Xt+\,Yt+i)\{It ,Jt)] < (1 —2pL)d{It,Jt). The challenge however 
lies in controlling E[(i(/f,/,)|(X,,Ff)] while coupling the replication and selection steps, since because of 
the global nature of the replication and selection steps, 'E[d{lt,Jt)\{Xt,Yt)] can become quite large. We 
control this increase by a careful choice of the metric d, and by appealing to the path coupling methods 
of Bubley and Dyer ||BD97J. The path coupling theorem says that for integer valued d, it is sufficient to 
ensureE [(i(X,+i,7f_|_i)|(X(,yf)] < a < 1 onZj for states Xj and F, satisfying (i(X,,F,) = 1 in order to establish 
fast mixing. Our coupling is then defined as follows. Fix a permutation of the N genomes in the chain 
Xt'. d{Xt,Yt) is then the minimum over all possible permutations of the N genomes in Yt of the sum of the 
Hamming distances between the genomes at the same positions in the two permutations. The main technical 
step is to show that for this d, the replication and selection steps can be coupled in such a way that starting 
fromXf and Y, satisfying d{Xt,Yt) = 1, E[d{It,Jf)\{X,,Y,)] after these steps is at most jf^i '^^^a" ^- ^^ ^^ ^"^ 
this step that we use the form of the distance metric d crucially; the details of the coupling are somewhat 
technical and involve arguing carefully that the coupling is valid, and are given in Appendix B.4| We then 



combine this with the coupling for the mutation step described above to show contraction in the expected 
distance under the condition (1 -2/i)jy^ '™^"^'" L < 1. 

4 Discussion and Future Perspectives 

4.1 Previous Work 

The notion of the quasispecies and the existence of an error threshold were recognized first by Eigen and his 
coworkers in the 1970s and 1980s | Eig71[|EMS89i Translation of these ideas into intervention strategies 



requires overcoming two key limitations of the quasispecies model. First, the model assumes an infinite 
population size, whereas realistic population sizes can be quite small. With HIV, for instance, the effective 
population size is estimated to be ~ 10^ — 10^ IKAB061IBSSD111 . Second, the theory assumes a single- 
peak fitness landscape, whereas realistic landscapes can be far more complex ||BCP"^04 , HMC^lll . Efforts 
over the last several decades have attempted to overcome these limitations of the quasispecies model IINS891 
IBS931 IWH96[ IWie97[ lAF98l [SH061 ITH071 IPMnDlOII (see Wilke IIWil05ll for a recent review). The finite 
population case, however, has remained difficult to solve in full generality. Most studies resort to simulations 
or use approximate or heuristic approaches to describe the finite population case, and we discuss some of 
these here. 

Nowak and Schuster IINS89I used a birth-death process to model the underlying evolution in finite pop- 
ulations and using simulations predicted that the error threshold scales as 1/VN. Their model, however, 
does not converge to the quasispecies model as N goes to infinity. Alves and Fontanari |]AF981 present a 
model which employs a two-stage sampling with replacement in the selection process: first sampling uni- 
formly from the population, and then sampling from the obtained sample with biases proportional to the 
fitness. They note, however, that sampling with replacement destroys the negative correlation between the 
selection of two individuals of the same species induced by the finite population constraint when selec- 
tion is implemented using sampling without replacement. They find that the error threshold scales as 1 /N. 



Further, they only analyze a heuristic deterministic approximation to their model, and do not consider rig- 
orously the question of convergence of their original model to the quasispecies model. The closest to our 
convergence result is that by van Nimwegen et al. [ vNCM99 1 who show convergence to a deterministic 
Eigen-like dynamics but again employ sampling with replacement and use special cases of additive fitness 
landscapes. With finite populations, there can be a significant statistical difference between sampling with 
and without replacement, the latter (which we employ) being more realistic. It is well known that as A/^ — )• oo 
the difference between sampling with and without replacement shrinks, but then as we prove, so does the 
difference between our population genetics model and the quasispecies model. Their convergence proof has 
a similar structure as ours but we are able to use Chernoff bound type inequalities which are much stronger 
than the second moment inequalities used by them. Consequently, our convergence results are quantitatively 
stronger. We additionally prove convergence of the error threshold and fast mixing, questions not considered 
by ||vNCM99|. More recently, Musso IIMusllll presented the transition matrix for sampling with replace- 
ment in the case L = 1 and also claimed convergence to the quasispecies model in the deterministic limit. 
No attempt, however, is made in IIMusllll to make this latter claim rigorous. Another class of studies relies 
on approximations and heuristics inspired from physics, and in particular statistical mechanics, to render the 
finite population case mathematically tractable (e.g., |BK98. SRA08. PMnDlOj). 

While previous studies have focused extensively on the fractional distribution of genomes at stationarity, 
little is known of the time to reach the stationary state. Campos and Fontanari IICF99I show that in the 
limit of infinitely large genome lengths (L — )■ oo) and population sizes {N — )• oo) and with the single peak 
fitness landscape, the timescale associated with the decline of the master sequence is ^/\n(qa) where q is 
the probability that a genome is replicated without error, and a is the relative fitness of the master sequence. 
Further, they show that with finite populations, this timescale is proportional to ^/N. The mixing time when L 
and A'^ are both finite and when the fitness landscape is more general than the single peak remains unknown. 
The latter mixing time has practical significance in the modeling of the action of mutagenic drugs, as it 
respresents the duration of therapy required to ensure completion of the transition to the error catastrophe. 
Our study presents conditions when the mixing is rapid and hence the transition to error catastrophe occurs 
quickly. Further, for computational studies that attempt to realize this transition in silico, our study presents 
an algorithm that allows efficient Monte Carlo sampling-based estimation of the error threshold. 

4.2 Applications of the RSM Model 

The motivation behind the RSM model and the algorithms discussed here is to get a basic framework for 
understanding the evolution of viruses of current interest such as HIV. Making concrete predictions relevant 
to the clinical setting requires super-imposing the specifics of the viruses of concern on the present frame- 
work. This often involves subtle modifications of the RSM process along with validation against data. For 
example, in related recent work, two of the authors and their co-workers applied the RSM model to mimic 
the within-host genomic evolution of HIV-1 IITBVD121 . It has been shown before MBSSDlll that these 
simulations quantitatively capture data of the evolution of viral genomic diversity in patients over extended 
durations (~ 10 years) following infection and the approach is extended in I.TBVD12 1 to estimate the error 
threshold of HIV-1. We envision that similar adaptation of our model will prove useful in elucidating the 
evolution and treatment guidelines for other asexual haploid organisms of interest. 

4.3 Critique of the RSM Model 

We note that our structural and computational results are independent of the nature of the fitness landscape 
so long as there are no lethal mutations (a^ ^ for any a). Our model, however, does not consider lethal 
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mutations. While letting some a^ be does not affect the quasispecies model due to the constant rescal- 
ing involved, it introduces an absorbing state in the RSM Markov chain, thus making it non-ergodic, and 
causing the population to eventually decrease to zero. While Wilke and others [ Wil05. ,WK93[ ITH07II have 
commented on the role of lethal mutations in extreme cases, establishing their full implications lies beyond 
the scope of the present paper. Although lethal mutations do occur, it turns out that in many important 
scenarios such as the evolution of HIV- 1, a Hamming class invariant landscape without lethal mutations ap- 
pears to capture key features of the underlying fitness interactions l,BCP^04il , rendering our RSM framework 
applicable. 

Finally, we note that our assumption of a fixed population size, N, is consistent with the widely accepted 
population genetics-based models of evolution, where a constant effective population size is employed to 
quantify the strength of stochastic effects IIHC061 . Note that allowing N to vary with time (generations), 
does not increase the complexity in our model. The distinction between an infinite population model and 
a finite population model arises from the culling of the population in the latter model in order to maintain 
a finite population size. A fixed N or varying N will only result in different extents of culling in different 
generations, but will not change the overall structure of the model. The advantage in keeping N constant for 
our present study is that it allows easier examination of the convergence to the quasispecies model. 

4.4 Open Problems 

Our study of the quasispecies and RSM models has revealed several interesting and important problems. We 
list the main ones here. 

Structure of the Quasispecies. Perhaps the most attractive feature of finite population models as opposed 
to the quasispecies model is that they can be used to study the effect of random genetic drift on inter-patient 
variations. Inter-patient variations in disease progression and response to treatments are known to be signifi- 
cant with HIV infection ||NBS"'"98llGKB"'"d5l . The collection of viral particles in an infected individual may 
be thought of as one realization of the random viral evolutionary process, and lim,^ooVar [Df] then pro- 
vides an estimate of inter-patient variations in viral evolution due to the effect of the finite population size. 
Thus, in addition to the structure of the quasispecies in the finite population case, defined by the expected 
frequencies lim^^co IE [Df] when N <°°, the variance of the frequencies lim^^co Var [Df] as a function of the 
population size N is also an important quantity to be studied. 

Error Threshold. In the quasispecies model with the single-peak fitness landscape, pic has been found, 
without a rigorous proof, to be 0{\/L), so that an error catastrophe occurs for /i ^ 0.5 (e.g., see IIEMS89II ). 
Further, the transition is sharp, so that a small increase in /x from below to above pic induces a dramatic 
change in the quasispecies structure. With other fitness landscapes, such as the multiplicative landscape, 
however, the quasispecies approaches the uniform distribution gradually as /x approaches 0.5 nWH96| . Fur- 
ther, lethal mutations, where a^ = for some a's, appear to show the existence of an error threshold only 
if multiple mutations in a single replication are allowed [ WK93I |TH07[ 1^10511 . Thus, the conditions under 
which a sharp transition leading to an error catastrophe at jXc <C 0.5 would occur remain to be established. 
Second, the dependence of jXc on N remains to be identified. While some simulations suggest a \/\/N de- 
pendence IINS891IBS93L others find the dependence to go as 1/A^ IIAF98II . As pointed out before, knowledge 
of pLc for finite A^ is important in the modeling of antiviral strategies based on mutagenic drugs. 
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Mixing Time. The main outstanding question here is to get a tight bound on the mixing time of the RSM 
Markov chain for a full range of evolutionary parameters. We notice that our result shows a good mixing 
time bound only under certain conditions on the parameters. Though we conjecture that the chain is rapidly 
mixing for other values of the parameters too, we believe that novel methods would be needed to extend 
our results in this direction. Apart from being useful in determining the time required for simulations to 
produce samples from the stationary distribution, the mixing time bounds also have biological significance. 
For example, when modeling the effect of a mutagenic drug under the RSM model, the convergence rate 
would models the minimum required duration of treatment before the error catastrophe occurs. 

5 Formal Statements of Main Results 

5.1 Preliminaries and Definitions 

In this section we present rigorous statements of our results. Several definitions may be found repeated here 
in the interest of the readability of this section. We recall that genomes of length L are denoted by L-bit 0-1 
strings. We will denote the Hamming distance between genomes a and T by dn (<7,t), and the Hamming 
weight of a genome a by wh {o). A population is defined as a multiset of genomes of the same length. 
While discussing the RSM model, we will fix the size of the population to be N. 

The Markov Chain for the RSM Model. We will denote the evolution of the RSM process using a time- 
indexed sequence of vector valued random variables (N,)J1q. The entries of N, are indexed by genomes a, 
and the entry N^ denotes the number of genomes of type a at time t. At every time t, Y.ae{() n'-^t^ ~ ^■ 

def 

The random variables Df = J^° /n denote the. fractional population of the genome a at time t. 
Reproduction Step and the Fitness Landscape In the reproduction step, each genome a produces aa 

def 

copies of itself, so that the number of genomes of type o after this step is I^ = aaNf , and the 

total number of genomes is If = Y,a ^aNf . The matrix A defined by Afj^ = a^ and A(jt = for a 7^ t 
is called the. fitness landscape. The fitness landscape is said to be class -invariant if a^, depends only 
on the Hamming weight of a. By a slight abuse of notation, we will denote by a, the fitness of all 
genomes with Hamming weight / in the class invariant case. 

Selection and Mutation Steps and the Mutation Rate In the selection step, N genomes are sampled with- 
out replacement from the genomes obtained after the reproduction step. In the mutation step, each bit 
of each of the A'^ genomes obtained after the selection step is flipped with a probability /x, called the 
mutation rate. The mutation transition matrix Q defined by Qa^ = }X'^"^° ■■'^\\ — ^)^^^«(^'^) gives the 
probability that a genome of type a mutates to one of type T in the mutation step. 

The RSM process as described above is a Markov chain on the state space of functions / : {0, 1} — > N, 
satisfying Lffg/o \\^ fi^) — ^- The transition matrix Ji( of this chain is described in SectionlAJ 

Fact 5.1. When the mutation rate /l G (0, 1) and aa > Ofor all <J, the Markov chain ^ corresponding to 
the RSM process is ergodic, and hence has a unique stationary distribution. 



This is a simple consequence of the fact that /I and A are positive. See Section B. 1 for a proof 
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Important Statistics of the RSM Process and the Projected RSM Process. The transition matrix ^ 
of the RSM Markov chain is of dimension ( ^ ) ^ ( /v ) • When the fitness landscape A is class 
invariant, one can get a projected Markov chain with a significantly smaller state space which can still be 
used to compute the average fitness and the average population of each fitness class at stationarity. Consider 
equivalence classes indexed by functions h : [0,L] — > N with J^f^oh^i) = N, such that a function / in the 
state space of ^ is in the equivalence class [h] if and only if for every / G [0,L], Y^faeio il^lw (f7)=;1- fi^) ~ 



h{i). We then have the following lemma the proof of which is in Section B.l 



Lemma 5.2. Let f,g belong to the same equivalence class h as defined above, and let [h'] be another 
equivalence class. We then have ^{f, \h']) = ^{g, \h']). 

Thus, we can consider the projected Markov chain ^„ with state space 

^w = < [h] 




Notice that |n».| = ( ^ ) ■ Also, if %„ is the stationary distribution of ^w, then by the projection property 

^m = I n{f). 

This property implies that the expected populations for every Hamming class of genomes and, hence, the 
expected average fitness at stationarity, are the same for ^^ and .^. 

Mixing Time. We will denote by 7i the stationary distribution of the RSM process, and let Nc» be a random 
variable distributed according to n. We know that the distributions of the random variables N, converge in 
total variation distance (and hence in distribution) to n, due to the ergodicity of the RSM process. We fix 
our notation for mixing times in this section. 

Definition 5.3. The total variation distance between two probability distributions 3l\ and ^2 on the sample 
space Q. is defined by\\&\-&2\\TV= max^cn | ^1 (A) - ^2 (A) | . 

Definition 5.4. Fix a Markov chain JV on a state space S. We define 

d(t) = max \\^'(a, •) -7i\\tv- 
aeS 

For < £ < 1/2, the mixing time of -jV is defined by 

def 

Tmix(e) = min{? : d{t) < e}. 

Notice that by the projection property, the mixing time of the projected RSM chain .J^w is at most the mixing 
time of the original RSM chain J^ . 

Error Thresholds. In the following definition, we specifically emphasize the dependence of the random 
variables N(,D, on pL,N by denoting them as ^t,^,N and Dt,n.N- We denote by Doo.^,a? a version of Dt,^,N 
distributed according to the stationary distribution of the RSM process, and by U the uniform distribution 
over genomes. Given a distance function d, one can define the error threshold with respect to d as follows. 
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Definition 5.5 (Error Threshold for the RSM Model). Let £ > 0. 

pii{E,N) = min{Ai G (0, 1) : d{¥. [D^,;,,iv] ,U) < e] , 

where U is the uniform distribution over all genomes of length L. 

Of particular interest is the function d'^ : for any two distributions S>i and ^2 over genomes, ^''(^1,^2) 
denotes |Lct^(<^)(^i {^) ~ ^2(o'))|. The corresponding jx will carry the superscript h. 

5.2 Convergence to the Quasispecies Model 

Our first main result is that the RSM model converges to the quasispecies model. 

Theorem 5.6. Fix a fitness landscape A with positive entries and a mutation transition matrix Q. Consider 
the RSM started with the initial state Dq and consider the evolution of the quasispecies model started with 
the initial state vhq = Dq. Then for any fixed time to, 

limE[D,jDo]=m,„, (4) 

where m^Q is the state of evolution of the quasispecies model at time to starting from mo- 



The proof of the above theorem is relegated to Section B.2 As a corollary to the theorem above, one can 
show that there is convergence of a finitary version of the error threshold iJ.'^{e,N) to the error threshold 
/i^(e) for the quasispecies model, as the population size goes to infinity. Formally, we have the following: 

Corollary 5.7. Fix a mutation rate /i < 1/2 and an error parameter e. For every 5 > 0, there exists a time 
?0 > such that for t > to, one can find an Ngj such that for N > Ngj, 

d^ (E [Df,^,Ar] ,U) > £ - 5, when jx < Hcie), and 
d^ (E [D,,^,A,] ,U) < £ + 5 when pi = ix'^{e), 

Here we use the subscripts /I andN to emphasize the dependence of the distribution ofDt on fX and N. 
Although we will prove our results for the error threshold in terms of the average Hamming distance, it is 



easy to translate our results to other common dispersal measures as described in Section B.1.1 The proof 



of the above Corollary follows easily from Theorem |5.6| and is given in Section |B.3 We note here that 



extending the above corollary to get convergence of finite population error thresholds depends upon proving 



a strengthened version of our convergence result (Theorem 5.6 1, which we leave as an open problem. In 
fact, on the basis of simulation results, we conjecture that for fixed £, iJL'^{e,N) monotonically increases to 

5.3 Computational Results 

5.3.1 Mixing Time Bounds on the RSM Process 



We give a coupling argument in Section B.4 which allows us to prove the following theorem. 



Theorem 5.8. Fix < /i < 1/2, and a fitness landscape A. Let 

„., xdef,, r, . N maxfjatj, 

K{A,ii) = {l-2ix)- — -^ L. 

N -\ mm^aT 

When K{A,pl) < 1, we have T^(£) = O ("g^ 
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5.3.2 Computing the Stationary Distribution in the Class Invariant Case 

Theorem 5.9. For every A which is class invariant, and }l which can be represented using b bits, there is an 
algorithm running in time T given by 

which computes Tlwfor the Markov chain ^„ described above. For fixed L,T = 0{N^^^ '). 



The proof of the above theorem appears in Section B.5 and is based on the projected RSM process discussed 



in Section [5TT] The above theorem immediately gives a grid-search based algorithm that given a grid resolu- 
tion 5 and £ > 0, outputs a approximation Ho to the error threshold in time T ■ ^/iS such that /Iq > IJ.c{£,N) 
and d^(J)^^^_§,U) > e. We now consider Markov Chain Monte Carlo based grid-search methods. 

5.3.3 Markov Chain Monte Carlo Methods 

The general strategy for Monte Carlo based grid search methods for determining error thresholds is described 
in the algorithm ErrorThreshold in Figurefllin the Appendix. We will denote the mixing time Tmix(£) 
for parameters L,A^, A and/i as x{L,N,A,pL,e). We consider the projected chain ^^ described above which 
contains enough information to compute the average Hamming weight, and whose state can be maintained 
as a tuple in {0, 1 , . . . , A^}^+^ 

Theorem 5.10. Let A be class invariant, and consider the error threshold }i^{e,N). Suppose the algorithm 
ErrorThreshold is run with input grid resolution 5, accuracy parameter 5i, and error probability &2. 
Let T be the maximum over k of the quantity T{L,N,A,kd, d\/{2L)) where k < 1/(25) is a positive integer 
The algorithm ERRORTHRESHOLD runs in time T s- 0( [1/(25)] A^LmaXtj a^), where 

■or4 

^log(2ri/(25)l(L + l)/52) 



and with probability at least 1 — Sz, produces an output jXq satisfying Ho > H^^s + 5i/2) and 



^''(D„.^_5,U)>£-5i/2. 



The proof of the above theorem appears in Section B.6 where we also point out some technical subtleties 
about the definition of error thresholds. 
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A Starting State and Transition Matrix of the RSM Markov Chain 

As stated before, the RSM Markov chain starts with the "fittest" possible population with all the weight 
concentrated on the master sequence, so that Nq = N and A',^ = for all a y^M. We now proceed to set up 
some notation for writing out the transition matrix ^. 

Definition A.l. (Multivariate Geometiic distiibution). Let g{o) denote the number of genomes of type 
O in an urn. Consider the process of choosing, without replacement, N genomes from this urn. Then 
phyp (^g _j. y;A/^) denotes the probability of obtaining f{o) genomes of type o for each o. We have, 

/'''-te^/;A^)^=''^^|^ (5) 

\ N ) 

Definition A.2. (Multivariate Binomial Distribution). Let f{o) denote the number of genomes of type 
O. Consider a stochastic process in which each genome of type o independently mutates into a genome T 
(possibly equal to o) with probability Q{o, t). We denote by P'"" (/ — )■ D; Q) the probability that D(a, t) 
genomes of type o mutate to type T under this process. We have 

^^/nni V<^D a,T TG{0,1} }/ ,,r„ni 



We can now write the entries of ^. For f,g:{0,\} — > N satisfying (/, 1) = {g, 1) = A'^, we denote by 
^{f,g) the conditional probability of obtaining g starting from / in one step of the RSM process. Given a 
function /{0, 1} — )■ N, we denote by Af the function such that Af{o) = aaf{o). Then, we have 

^{f,g)= £ P'^y^iAf^h-N) £ P'''"(/i^D;e), 

h:{h,\)=N D:lD=g;Dl=h 

where Q and A are as defined above. 
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B Proofs Omitted from Section |5] 
B.l Proofs Omitted from Section 15.11 



Proof Sketch of Fact 5.1 When /x E (0, 1) and aa > Q for all a, it can be verified easily that this chain is 
irreducible and also has a non-zero self-loop probability at every point in the state space. Thus, the chain 
is ergodic and hence by the Fundamental theorem of Markov chains, has a unique stationary distribution to 
which it converges as f — )• oo. D 



We now give a proof of Lemma 5.2 



Proof of Lemma 5.2 We will show that under class invariance, we can project the RSM Markov chain so 
that its state space consists of equivalence classes indexed by functions h : [0,L] — > N with Lf=o'^(0 = ^' 
such that a function / in the state space of ^ is in the equivalence class [h] if and only if for every / G [0,L], 

I no)=h{i). 

{ae{o,i}^\wH{<y)=i} 

We will find it convenient to consider the reproduction and selection phases separately from the mutation 
phase, show that the projection described above can be done for both of them, and then combine the two 
results using the following general fact about projected Markov chains, the proof of which we include for 
completeness. 

Fact B.l. Let P and R be the transition kernels of two Markov chains on the same state space Q., and let S 
denote the composition PR of the two chains. Suppose that there is a partition ofD. into equivalence classes 
Q!, such that for any f = f, and any equivalence class [g\, we have 

P{f, [§]) = P{f', [g]) andRif, [g]) = /?(/, [g]). 
Then, we also have S{f, [g] ) = S{f'{[g]), for all /, /' and g as described above. 
Proof The proof is by direct computation. We have, 

SifM = l,P{f,q')R{q'M 
q'eO. 

= I 'LP{f,q')RWA8]) 

lq]en' q'e[q] 

= I/?(^,[g])P(/,M) (6) 

[q]en 

= I /?(<?, k])P(/,M) (7) 

[q]ea 

Just as in the derivation of equation (J6|l above, we get S{f\ [g]) = T,[q]eQ.' P{l ^ [8])P{f\ [<?])> ^rid hence, by 
equation (jv]), we have S{f, [g]) = 5(/7[g]), as claimed. D 

In order to use the last fact, we now decompose the matrix of the RSM process into the following two 
Markov chains on D.: 
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1. The Reproduce- Select Chain. We denote the transition matrix of this chain as P, such that P{f,g) 
is the probability of obtaining the state g starting from state / after the reproduction and selection 
phases. Notice that 

P{f,g)=P'''^Af^g-N). 

Assume that A is class invariant and let A{i) denote the reproduction rate for a genome of Hamming 
weight /. For an equivalence class [h] of Q. as defined above, we consider the probabihty P{f, [h]), 
with / G [h']. By the definition of P'^^p (— )•; ), this is the probability of drawing h{i) genomes of Ham- 
ming weight / for < i < L, when N genomes are drawn without replacement from a bag containing 
A{i)T,a:wH{c>)=if{'^) =A{i)h'{i) genomes of weight /. By definition, this probability depends only on 
h and the equivalence class h' of/, and hence P{f, [h]) = P{g, [h]) when A is class invariant and f = g. 

2. The Mutation Chain. We will directly write down the entries /?(/, [h']) for the probability of obtain- 
ing a state in the equivalence class [h'] starting from a state / in the equivalence class [h] . We will 
show now that we can write R{f, [h']) in terms only of h and h', and hence /?(/, [h']) = R{g, [h']) for 
f = g. Denote by £2ij the probability that a string a of Hamming weight / transforms into some string 
of Hamming weight j in the mutation step, and notice that this probability is well defined because of 
the definition of the mutation transition matrix Q. Since / G [h], there are h{i) strings of Hamming 
weight / initially, for <i < L. Denote by dij the number of strings of weight / which transform into 
strings of weight j in the mutation step. Then, we have 

«(/.['■'])= L n L.m<j<L}) n ^f/- ® 

d:Y,jd^j=h(i)0<i<L \\"'yl" ^ i ^ ^J/ 0<j<L 

L.du=h'U) 

Since /?(/, [h']) depends only upon h and h', we get that /?(/, [h']) = R{g, [h']) for f = g. 



Combining the above two discussions and using Fact |B.l[ we see that when A is class invariant, the transition 
matrix ^ of the RSM process satisfies ^(/, [h']) = ^{g, [h']) whenever f = g. This completes the proof 
of Lemma 15. 21 D 

B.1.1 Relationships between Error Thresholds 

We first define error thresholds according to various dispersal measures. 

Definition B.2 (Error Thresholds). Let £ > 0, and U be the uniform distribution over the set of genomes. 

1. Atf''(e,A^) =min{jUG(0,l):||E[D„.^]-U||j <£}. 

2. pL^,{e,N) = minJAt G (0, 1) : |i:^w,/(a)E [D2,^] -2-'^ZaWH{o)\ < s}. 

3. iJ.f^{£,N) = min{/i G (0, 1) : |H(E [Doo,^]) — H(U)| < e]. Here H denotes the Shannon entropy, 
using the base e. 

Definition B.3 (Error Threshold for the Quasispecies Model). Let jj. G (0, 1) and let v^ denote the the 
stationary expected fraction vector with i.\ norm 1. The error thresholds are defined as follows. 

L Aif ''(e) = minJAt G (0,1) : ||v^-U||i < e]. 
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def 



2. Atc'(£) = minliU G (0,1) : \LoVi,io)wH{o) -2-^j:„WHio)\ < s}. 



3. IJ.f^{s) = min{/x G (Oil) • |H(v^) — H(U)| < e}, where H denotes the Shannon entropy, using the 
base e. 

Our results are mostly stated in terms of the error threshold jxl'.. However, we now describe how the different 
definitions above are related to each other. The following inequalities relate the different distance measures 
that we have considered. The first of these follows from the definition of the i\ norm, while the second is 
the well known Pinsker's inequality relating the i\ norm to the entropy. 



E 



Y,^H{o)D'i^^ 



■no^mi^"^ 



< 



l||e[d. 



-u 



||E[D„.,^]-U||j < -^2|H(E[D„^])-H(U)| 
This gives us the following relationship between the error- thresholds: 



(9) 
(10) 



(11) 



Using the fact that the distributions involved are defined over a state space of size 2^, we can show the 
following weak converse to inequality ([TO]): 



|H(E [Doo,^] ) - H(U) I < 2^ I |E [Doo,^] - U| | ^ . 
This gives us a further relationship between the error thresholds: 

iUf''(e,A^)>Atf(2V,A^) 
However, we notice that one cannot in general close the loop in inequalities 



and (10 1 (and hence in 



inequalities (111) above. To see this, consider for example the following two distributions P and Q for 
L>1. 

1 . P: puts total weight 1 — e on weight 1 strings and weight e on the string 0, so that the average 
Hamming weight is 1 — £. 

2. Q: puts total weight (1 — £)/L on weight L strings, and weight 1 — (1 — £)/L the string 0, so that the 
average Hamming weight is still 1 — £. 

The average Hamming weight in both cases is 1 — e, so that in that metric, the distance between P and Q is 
zero. However, the total variation distance between P and Q is at least 1 — £. 

B.2 Proof of Theorem g^l 

In the rest of this section, we will use the following concentration inequalities about the multivariate hyper- 
geometric distribution: 

Fact B.4. Consider the hypergeometric distribution P^'^'P {g — t- /;A'^) defined in equation (pi) above. Let D^ 
be the random variable denoting the fraction of genomes of type O which are drawn in the process starting 
with g{z) genomes of each type z. We then have: 
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1. E[Z)-] = g}- 

2. The following concentration inequality holds for £ > 0: 

F [|D^ - E [D''] \>£]< 2exp {-e^N) . 

Similarly for the multivariate binomial distribution, we have the following: 

Fact B.5. Consider N genomes with f{o) genomes of each type o. Let D^ be the random variable denoting 
the fraction of genomes of type O after a mutation step. Then 

1. nD''\ = h^af{m^a. 

2. The following concentration inequality holds for e > 0/ 

P[|D''-E[D^]| >£] <2exp(-2£2A^) 



Fact B.4 is a consequence of Azuma's inequality, and a proof can be found in the book by Dubhashi and 



Panconesi [DP09|. Fact B.5 is essentially a restatement of the Chernoff-Hoeffding bound. Combining the 
above bounds, we can deduce the following concentration inequality for each step of the RSM process: 

Lemma 6.6. Consider a state "^i of the RSM process. We then have 

1. E [Df_^i|Nt] = '^^ = ^-''(Dt), with r" as defined in equation (il). 

2. Let El and £2 be arbitrary positive constants. Then with probability (conditional on Ntj at least 
1 - 2^^+' (exp {-eJN) + exp {-le^N) ), we have |Df^i - E [Df_^i |Nt] | < (d + £2) for every a. In 
particular, choosing £1 = £2 = £/2, we get that with probability (conditional on Nt) at least 1 — 

22^+2 exp {-e^N/4), we have \Df_^^ - E [Df_^i |Nt] | < efor every a. 

Proof. For ease of notation, let g^ = a^N^ . Let I^ be the random variable denoting the fraction of genomes 
of type a left after the selection step. Thus, we have 



E[ 



[/"IN,] = ^ andE [Df+i|I,N,] =Y^l'Qr,o. 



Using a union bound over all genome types with the concentration inequality in Fact B.4[ we get that 
with probability at least 1 — 2^+^ exp (^—eJN) conditioned on Nj, we have 



re 



ga 



(g,l) 



< £1, for all a. 



We denote the above event by (o. Now, we consider the concentration of D, conditioned on I. Using a union 



bound over all genome types along with the concentration inequality in Fact B.5 we get with probability at 
least 1 — 2^+^ exp [—le^N) conditioned on I, we have 






V'Q. 



< £2, for all a. 
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We denote the above event by ^ . With probabiUty at least 1 — 2^^+^ (exp (^—e'^N) + exp {^—IeIn)), condi- 
tioned on N;, both S" and ^ occur, and then we have, for all a. 



|AV-E[AV|N, 



l!2rcT(A+l 



gx 



< 



e2 + £i£2TC7 = ei+£2, 



r 



^T 



r,l) 



D 



which is what we sought to prove. 

Before proceeding, we need the following lemma: 

Lemma B.7. Fix a fitness landscape A with positive entries and a mutation transition matrix Q with /x < 1/2. 
The functions r^ defined in equation (|2]) are Lipschitz with Lipschitz constant 

maxrar 

K = 

mintaT 

on the set of probability distributions over genomes. 

Proof. For a probability distribution x over genomes, we have 

dr'^'ix 



((l-At)^-M") 



dxa 



< 



max^aT 



Qc 



-r-(x) 



((i-m)'-m") 



ram^^az 

where the last line follows by noticing that fact that for all x and all o' , min^ ^ Q^x < r^ (x) < maXfj ^ Qax, 
and mint7T Qax = M^, while maXfjT Qax = (1 ~ M)^- Thus, by the mean value theorem, for any probability 
distributions x and y over genomes, we get 



|r(x)-r(y)| <^||x-y| 



I • 



D 



Proof (of Theorem 5.6). Fix a time fo- In the rest of the proof, we drop the conditioning on the initial state 
being concentrated on the master sequence for ease of notation. We will prove the following claim by 
induction for < ? < ?o: 

Claim B.8. For every o £ {0, 1} and <t <to, there exist If .,uf and pt satisfying the conditions 

def 

i. < If < uf < 1, and Sf = maXfj uf — If and pt are oa?(1). Also, mf lies in the interval \lf .,uf]. 
2. With probability at least 1 — pt, Df lies in the interval [If , uf] for all <J. 



We first see how to finish the proof of Theorem 5.6 assuming Claim [R8 From item|2]in Claim B.8 and 
using mf G [^/^,Mf ] ,we get 

\E[DZ]-mfJ<pr, + i\-pr,)K-lfJ,ioTallCJ. (12) 

Now item [T] of the claim implies that the right hand side of equation (12 1 goes to as A/^ — )• oo, which 
concludes the proof of Theorem 5.6 assuming Claim B.8 D 
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We now proceed to prove Claim B.8 



Proof of Claim B.8 At f = 0, we can set If = uf = mf , and pt = 0. By the definition of tiie starting state, 
tiiis satisfies tiie conditions claimed in the claim. Now suppose that we have shown that with probability 
1 - Pt, we have Df £ [lf,uf] for all a. We call the latter event S't. Recall that 

E[Df+i|D,]=r^(D,), 

and define 

l'f_^i = min r'^(y); m',+i = max r'^(y) 

Notice that mf^y G [Z',^i,M'f°^j]. Also, because of the Lipschitz condition on the function r^ shown in 
Lemma B.7 we have u'f_^y — l'f_^y < l^Ksf. Now, we condition on the event £{ defined above, and in this 
case, we have 

E [A+il^,] G [l'r+i,u%,] , for all C7. 

Choose £(A^) = A^"'/^ = Oiv(l), and set //^^ = rf_^y - e/2, uf_^y = u'f_^i +e/2. Using the concentration 



result quoted in Lemma B.6[ we get that conditioned on S't, with probabihty at least 1 — p{N) where p{N) = 
exp(-n(A^i/3)) = o;v(l), 

A+ie Ki,«r+i]>foralla. 

Now, we saw above that (at occurs with probability at least I — pt- Hence, by a union bound, we get that 
with probabihty at least 1 —pt+i, where pt+i = pt + p{N), 

A^'+ieKi^Wml-foralla. 

This proves the induction hypothesis, except that we need to make sure that 5, , p, are oa^ ( 1 ) . We first consider 
St. From above, we have the following recurrence for St'. 

St+i<2^Kst + e{N)\ sq = Q. (13) 

This satisfies St = Oj^{e) = on{\) for all t < to, by the choice of e. Similarly, we have pt = tp{N) = on{1) 



for t < to by the choice of p{N). This proves Claim B.8 D 



B.3 Proof of Corollary ^ 

We begin by noticing that for < /x < 1/2, we can choose a time ?o such that for t > to, the state nif of the 
quasispecies model satisfies 

\d'\mt,U)-d\y^,U)\<5/2, (14) 

where v is the unique stationary vector of the quasispecies model. Now fix ? > ?o- Since the distance function 



d^ is continuous. Theorem 5.6 allows us to choose an Ng such that for N > Ng, 

|^''(m„U)-/(E [D,,^,;v] ,U)| < 5/2. (15) 



Combining equations ( [T4| ) and ( |T5| ), we get 

|j''(v,U)-/(E[D,,^,^],U)|<5. 

Thus, when jj. < pL^{e), we have 

d" (E [D,,^,/v] ,U) > £ - 5, when ^i < n^e), and, 

and when /I =/x/'(£), 

d'' (E [D,,^,;v] ,U) < £ + 5 when n=^'Xe). 
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B.4 Proof of Theorem |U 

Before proving Theorem |5 .81 we first set up some notation for the coupling argument. 



Definition B.9. A coupling of two probability distributions Sl\ and ^2 is a pair of random variables (^, F) 
defined on a single probability space such that the marginal distribution ofX is '3l\ and the marginal distri- 
bution ofY is ^2- 

Definition B.IO. A coupling of Markov chains with transition matrix ^ is defined to be a process (^r, J'r )JLo 
with the property that both (Xf) and (Yf) are Markov chains with transition matrix ^, although the two 
chains may possibly have different starting distributions. 

Any coupling of Markov chains with transition matrix ^ can be modified so that the two chains stay 
together at all times after their first simultaneous visit to a single state: more precisely, such that if X^ = F^, 
then Xt = Yt for t > s.\n the following, we only consider such couplings. The following well known facts 
are the basis of coupling based methods for proving mixing time bounds. 

Tfieorem B.ll. Let {(Xj ,Ff)} be a coupling satisfying the definition above for which Xq = 0; and Yq = fi. 

def 

Let Tcoupie be the first time the chains meet: Tcoupie = min{f :Xt = Yt}. Then 

||^'(a,-) -^'(i8,-)||rv < IP [Tcoupie > f|Xo = a,Fo = j8] . 

Lemma B.12 (Coupling Lemma). Let X,Y be random variables defined on a finite sample space D. and let 
'W be any coupling ofC and Y. Then 

mmF'^[X^Y] = \\X-Y\\Tv- 
% 

Definition B.13. Let d : D.x D. — > M>o be a distance metric on the state space Q. of the two Markov chains 
{Xt}t and \Yt}f Suppose ^ is a coupling such that for every t >0, 

lK[d{X,+i,Yr+i)]<e-E[d{X„Yr)] 

for every starting distributions Xq, Fq, then we call 'rf a {6,d) coupling. Note that this implies that 

E[fif(X„F,)]<0'-D, 

where D = r[iaXa.ieQ.d{o,x). 

Fix a integer valued distance function d. Let let {X,}, be a realization of the Markov chain starting from Xq 
and F be another realization starting from the stationary distribution n of the Markov chain. If '^ is a (0 , J) 
coupling, then it follows from the Coupling Lemma that 

Coupling J j„fp„rQl Markov 

||X-7r||ry < P[X / F] '^ =^"'p [J(X,F)] > 1] < E[J(X,F)] < 6' -O- 
This implies that the mixing time Tniix(£) = O i °^ {\il] ) when < 1. 

B.4.1 A Coupling for the RSM Process 

The coupling ^ we will construct will have two independent parts ^ = ('^5,'^m), "^s for the Reproduce- 
Select phase and 'Iom for the mutation part. We first describe the somewhat simpler mutation coupling. 
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Mutation Coupling ^. Let Xt = {0\ ,..., Gn} and Yt = {vi , . . . , v^}- Let M denote an arbitrary per- 
mutation on [A'^] such that M{i) denotes the image of / € [A'^]. We define the distance between the states 
as 

def f ^ 1 

The mutation coupUng follows the following algorithm, with M set to be the permutation which achieves 
the minimum in the above definition. 

1. For / = !,..., A^ 

(a) For j = l,...,L 

i. Choose independently and uniformly at random ry from [0, 1]. 
ii. li OiU) = VM{i){j) 

A. Flip Oi{j) and VM{i){j) if and only if ry > 1 — /x. 
iii. Else 

A T ^ def J def ^ 

A. Let ry 1 = ry and ry.2 = 1 — 0- 

B. Flip ai{j) if and only if ry i > \— jj.. 

C. Flip Vm(,) (j) if and only if ry^2 > 1 - M . 

Lemma B.14. For jj. < 1/2, "^m i^ a ((1 — 2/x),(imatch) coupling. 

Proof. To prove that "^m is a valid coupling one just needs to note that if r is distributed uniformly at random 
in [0, 1] then so is 1 — r. Hence, for / = 1, . . . , A'^ and j = 1, . . . ,L, each bit Oi{j) (respectively, v,(j))) flips 
with probability exactly ji . Further, these flips are independent by construction. 

To prove that "^m is a ((1 — 2/x),(imatch) coupling, let Xt,Yt be the states of the two Markov chains with 

def 

distance d = d^axchi^t i^^t) ■ By definition of fi?match, there is some matching M* which achieves d. Without 
loss of generality assume that M* is identity, i.e., M*{i) = i, for all \ <i < N. Hence, Y4L1 dniOi, V,) = d. 

Let Xf+i = (a{+\ . . . , a'^^ ) and Yt+i = ( v|+ V • • ; v^^' ) be the output of '^m on input (ai , . . . , Gn) and 
{Vi,...,Vn) respectively. We will calculate E [T!iLidHi<y'i^^ ,v'i^^)] and show that it is exactly (1 -2/x) -J. 
Hence, (imatch(^«+i,if+i) < (1 — 2/x) -d, as (imatch is defined as the minimum over all possible matchings. 
By linearity of expectation it is sufficient to show that for all / = 1 , . . . , A'^, 

E[dHio'+',Viit + \)]=i\-2^)-dHiOi,Vi). 

Again by linearity of expectation it is sufficient to show the following: 

E,^. [dH{ar\j),vj+\j))] = {\-2n)-dH{ai{i)MJ))- 

This follows from observing that if Oi{i) = Vi{j), then P [ct/^^(j) = v,'+^(7)] = 1, while if ai{j) / v,(j), 
then, as ^ < 1/2, P [a/+' (j) = v;+' (j)] = 2i^. Hence, P [a/+^ (j) / v,'+^ (j)] = 1 - 2;U. This completes the 
proof. 

D 
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Coupling "^5 for the Selection Process. We again consider two states Xt = {ai,a2,...,o>/} and Yt = 
{vi , V2, . . . , Vn}- Our distance function is still Jmatch defined above. We first note the Jmatch (t) is actually 
a metric. 

Lemma B.15. Jmatch (•, •) is a metric. 

Proof. By construction d^^tchO^ ,Y) > with equality if and only if X = Y. Now consider states X = 
{<^(},=i jY = {'Ci}j^i and Z = {v,},-^j in the state space Q.. Let a and j8 be permutations of [A'^] such that 



N N 

Y, ^H {Oi, Taii)) and Jmatch {Y,Z) = ^ 
i=l i=\ 



^match (X , F ) = ^ J// ( a; , To; (,•) ) and Jmatch (F, Z) = ^ J// ( T; , V^g (,) 



Now we have 



N 
^match(^,-Z) < ^Jf/(a;,Vp(a(,-))) 
r=l 

/V 

i=\ 
= "match (^ 5 ^ ) + "match (^) -Z) . 

D 
We will use the following general path coupling result of Bubley and Dyer IIBD97II to define the coupling 

'las. 

Theorem B.16 (Path Coupling IIBD97II ). Consider a Markov chain M on state space Q. and a distance 
function d on D. such that d'{x,x) = Ofor all x £ Q.. Consider a connected undirected graph G on Q. such 
that the length of each edge {x,y}, if present in G, is d{x,y), and let d' be the shortest path metric on G. 
Suppose there exists a coupling '^ for M such that for some (X <\, and all Xf , F, G H which are adjacent in 
G, 

E.^[fif(X,+i,F+i)|X„F] < ad{X,,Yt). 

If every edge of G is a shortest path under the metric d' described above, then the coupling '^ can be 
extended to a coupling '^' such that 

^^g, [/(X,+i,F,+i|X„F,] < ad!{X,,Yt). 

forallXt.Yt^Q.. 

We will first show now that the path metric resulting from an application of the above theorem to Jmatch 
is c?match itsclf, sincc this is crucial for composing the '^5 coupling with the coupling '^m described above. 

Lemma B.17. Consider the state space D. of the RSM Markov chain ^ . Let G be the graph on D. in which 
two vertices X and Y are adjacent if and only ifdmatch (X,F) = 1. Then the path metric d' constructed in 



Theorem B.16 is identical with d, and each edge in G is a shortest path. 



Proof. For brevity we will denote Jmatch (•, •) by d. Notice that since each edge is of length 1, it is also a 
shortest path by construction. Since J is a metric, we also have d'{X,Y) > d{X,Y) for all X,Y G D.. We 
now proceed by induction to show that d{X ,Y) > d' {X ,Y) ior all X ,Y £ D.. Notice that when d{X,Y) = I, 
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this is true by definition of d' . Now, suppose that d{X, Y) <k—\ impUes d'{X,Y) < d{X,Y), and consider 
the case d{X,Y) = k > I. We claim that there exists a Z such that d{X,Z) < k — I and d{Y,Z) < 1. The 
existence of such a Z implies using the induction hypothesis that 

d'{X,Y) < d'{X,Z) + d'{Z,Y) 

< d{X,Z) + d{Z,Y)<k = d{X,Y). 

It only remains to construct such a Z. Let X = {cJ,};^i and Y = {v,},^j. Without loss of generality, we 
may assume that d{X, Y) = Y4L1 dn (cJ,-, V,). Since d{X,Y) = k> I, there exists a j such that dn {Oj, Vj) > 1. 
Let s be a string obtained by flipping a single bit of Vy such that dn {<yj,s) = dn {Gj,Vj) — 1. Now let 
Z = {T;};^j, where T,- = v, for / / j and tj = s. By construction, d{Y,Z) < 1 and d{X,Z) <k— I. 

D 

Claims about the Coupling. Suppose we find a coupling C for the selection phase such that when 
"S^match {Xt,Yt) = 1, then the intermediate states I{Xf) and I{Yt) satisfy 

E[drn.tch{I{Xt),nY,))\X,J,]<a, 



then using Theorem B.16 and the coupling for the mutation phase described above, we get 

E[d^,,,i,{Xr+i,Yt+i\Xr,Yr)] < a{\ -2^)dm.tch{X„Yt) ■ 

This will give us fast mixing as long as a(l — 2/i) < 1. 

We now describe such a coupling for the selection process, for general X, and Yt, which we will analyze 
only in the simple but sufficient case when (imatch {Xt,Yt) = 1. Suppose that Xt and F, contain, respectively, 
w^ and «CT genomes of type a. After reproduction, the number of genomes of type a in the two chains 
is aau^fj and aan^ respectively. Let the total number of genomes be Mx = Lcr'^CT^a ^"^^ My = Y^a^on^a 
respectively. Without loss of generality let us assume that Mx > My, and set M = Mx. 

We now construct a bag of M balls as follows. For each a, the bag has exactly flfjmin (n^,?!^) balls 
with label (a, {x,y)). If «^ ^ "cf then the bag has exactly afj(?i^ ~ '^ct) balls with label (cJ,jc), otherwise it 
has exactly aa{n'(, — rf^) balls with label {o,y). Thus the total number of balls in the bag is M. 

We now take a random permutation of the M balls, and take the intermediate state Ix (respectively, /y) 
to be the multiset of genomes given by the first N balls carrying the label {x,y) or x (respectively, {x,y) or 
y). Notice that a ball carrying a label {x,y) can contribute a genome to both Ix and ly. 

Claim B.18. The above coupling is a valid Markovian coupling for the selection phase of the RSM chain. 

Proof. Notice that sampling without replacement a objects from a set of b objects is equivalent to taking the 
first a elements from a uniform random permutation of the b objects. Also note that given a subset 5 of a set 
of b objects, and a uniform random permutation a over the b objects, the restriction of a to the elements of S 
is a uniformly random permutation of the elements of S. Now consider the set of M labeled balls constructed 
above, and define Sx (respectively, Sy) to be the set of balls carrying a {x,y) or x (respectively, x{x,y) or 3^) 
label. By the observations above, see that the set Ix (respectively, ly) has the same distribution as if it was 
sampled without replacement from Sx (respectively, Sy). This proves the claim. D 

Lemma B.19. Suppose (imatch {Xt,Yt) = 1. Then under the above coupling, we have 

„r , N , 1 1 maxflfT 

E Jmatch iixM%M< - — r —^L. 

1 - ^ mmacj 
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Proof. For brevity, we will denote Xt and 7, as X = {(7,},-^j and Y = {v;};^i and (imatch (•, •) by d. Since 
d{X,Y) = 1, we can assume without loss of generality that a,- = V; for / > 1, and 0\ and Vi differ in exactly 
one bit. 

We now consider the coupling described above, and let S be the set of balls with label {x,y). Notice that 
l^l = Lkkw'^ct,- Notice that l^l > {N — l)mint7fl'cy. We assume without loss of generality that Ua^ > ^vp 
so that the number of balls M in the bag is 151+ a(y^ . Consider a random permutation a of the M balls, and 
let / be the (random) multiset of balls with label {x,y) occurring in the first N positions of a. Notice that 
l^x n /y I > |/| . We observe that if the intersection of Ix and ly (seen as multisets) is of size at least |/| , then 
(^^match {Ix,h) <L{N-\I\), hence we have 



Now, we have 



K[d^^tch{Ix,l¥)\X,Y]<L{N-E[\I\ \X,Y]). (16) 



> n\i-^]>nI\ ''''' 



S\ J \ {N — I) min^ Ut: J ' 



Plugging this into equation ( (T6| ), we get 

E[dm,t,Uh,lY)\X,Y] < 



LNa 



0\ 



{N - l)minTat 



1 maxr Qr 
< 1 t J 

1-^ min^flT 
which establishes our claim. D 

Using the above discussion, we get fast mixing under the condition that 

1 maXffflfj 

(l-2u) , ■ L< 1. 

1-^ mincrflCT 

Formally, we have 

Theorem B.20. Fix a mutation rate }X <\/2, and a fitness landscape A. Define 

1 maXfyafT 

When K{A,pL) < 1, we have TmUs) = O (^g^ 

We note here that the main difficulty in designing coupling arguments for the RSM process is the distance 
expansion property of the reproduction and selection phases. Given two states of the RSM process, the 
reproduction phase amplifies the distance between them, and the nature of the selection phase tends to 
keep this distance intact. In this setting, the chain can be described as a noisy random walk on a boolean 
hypercube, and our bound reflects the intuition that when the noise is small, the fast mixing property of 
the hypercube should able to enforce fast mixing of the RSM chain. We reemphasize that we consider that 
achieving a better understanding of the mixing properties of the RSM walk, in terms of both upper and lower 
bounds appears to be an interesting and challenging open problem. 
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B.5 Proof of Theorem ^M 



We now proceed to prove Theorem 5.9 We will use notation similar to that used in the proof of Lemma 
5.2 in Appendix B.l By a slight abuse of notation, we will use P and R for the transition matrices of the 
projected versions of the chains described in the proof of Lemma [S!2] above. We note that for any numbers M 
and M[,M2, ...,M'i summing up to M, (^, ^, ^,) is of size at most O(MlogM) bits and can be computed 
in at most 0{M) operations over numbers of size at most O(MlogM). We start with an estimation of the 
time required to compute the entries of P. Using the above observation, and the form of entries of P, we 
have the following observation for P. 

Observation B.21. Each entry of P is of size at most OiNva&xaa), and can be computed in Nmaxaa 
operations over integers of size 0{Nmaxa(j). 

We now proceed to estimate the complexity of computing entries of the matrix R. We first get a bound 
on the time required to pre-compute the (L + 1 ) x (L + 1 ) matrix J2 defined in the description of the mutation 



chain in Appendix B.l 



Observation B.22. Let b be the number of bits required to represent }X. We have 



^. = (i-m)Mt^ I 



L — s\ s 



l-ji J rf\ X J \t-xj \\- n 

Hence, each entry of JS is of size at most 0{bL) and can be pre-computed in time 0{L^) operations over 
numbers of size 0{bL). 

Proof of Theorem \5.9\ Notice that the number of terms in the sum in equation ([8]) for the computation of the 
entry of the matrix R{[h], [h'] ) is at most 






L 



N 



UL+l) 



< [e(\ + — ^ ) ) , by the AM-GM inequality. 



We will use the shorthands S = (^+^) < ^-^^^ and 



G={e[\+ ^ 



L(L+i; 



L(L+1) 



Notice that R is of dimension at most S x S. Computing the products of all the J2i/s in each of these terms 
takes N multiplications on numbers of size at most 0{bL), and hence produces a number of size 0{NbL) 
in time at most 0{NbL). Computing the products of all the multinomial coefficients in each of the terms 
takes 0{N) multiplications on integers of size logN, thus producing an integer of size 0{N\ogN) in time 
0{N\ogN). The total size of each entry of R is thus at most s\ = 0{logG+NlogN + NbL), and the entry 
can be computed in time ti = 0{Gsi). All the entries of/? can thus be computed in time Sh\. 

Notice that ^^ = PR, and given the above estimates on the sizes of the entries of P and R and the times 
required to compute their entries, each entry of ^„, is of size at most S2 = 0{\ogS + si +Nmaxa(y), and 
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can be computed in time 0{Ss2), given the entries of P and R. Thus, the time taken for the computation of 
^w, including the computation of R and P and the pre-computation of J2 is 

Ti = d{bL^ + S^S2 + S^Gsi) 

and each entry of J^w is of size 

S2 = d{NbL+Nmaxaa+L^). 

The time required to compute an exact solution of ^„%w = %„ with the restriction 1 1 TTwl 1 1 = 1 using Gaussian 
elimination is thus of the order OiS^sj). Comparing this with T\, we get that the total running time is 0(J\\ 
which is what we sought to prove. D 

B.6 Proof of Theorem l5J0l 



Input: An initial state Oq G ^y,, , an £ > 0, grid resolution 5, accuracy parameter b\ , and error probability 
52, L,N and an A which is class invariant. 

Goal: To estimate lx^(e,N). 

Output: ^o suchthat^Uo > }i'^{e + 5i/l) and<i''(D^^g_5,U) > e-5i/l. 



Let £' = 2l2 (the distance from stationarity), c = \j^'\ and s = ^log(2c(L+ l)/52) 
pies from the distribution). 
For 5 < /x < 1/2 in steps of 5, 



(number of sam- 



1. Let T = T(L,Af,A,^,£'). 

2. Let Tix,k, for ^ = 1, . . . ,s, denote s independent samples from the RSM process with parameters 
L,N,A and /i starting from the initial state Oq. 

3. Letz/=in^iD,,,. 

4. If <i''(Z„ U)<e then Return n and Stop. 

5. Else /I = IJ. + 5. 



Figure 1 : The Algorithm ErrorThreshold 



In this section, we give a proof of Theorem |5.10 Recall that ;r„. denotes the stationary distribution of the 
projected RSM chain described in Section [5?T| Using the definition of the mixing time, and the Chernoff- 
Hoeffding bound, we have the following lemma in order to bound the number of samples s required in each 
iteration of the algorithm: 

Lemma B.23. Suppose we take s samples from independent realizations of the fully mixed projected RSM 
process, denoting the samples so obtained as D(/) for i = 1,2,. .. ,s. For any fixed genome <J, we then have 
E [D^(/)] = E;r„. [D'^jfor all i. Now for every e > 



£d-(/)-e,„,[d^ 



i=i 



>£ 



<2exp(-2e^i') 
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In particular, if we run s independent realizations of the chain up to the time t{L,N,A,ij.,£/2), and let D; 
denote the sample obtained by the ith realization, then 



£d-(/)-e,,„[d-] 



i=i 



>£ 



<2exp(-£V2) 



Proof of Theorem 5.10 Consider the random variables Zs in any of the at most c iterations in ErrorThresh- 
OLD. By the choice of e' and s, we get from the above lemma and a union bound that all the c different 
variables Zs that we get across all the iterations of the loop satisfy 



-IE.„,[D]|L<5i/(2l2 



(17) 



with probability at least 1 — 52- In the rest of the proof, we condition on the above event occurring. 

Since the average Hamming distance can be at most L, and we are running the projected chains till 
'z{L,N,A,iJ.,5i/{2L^)) for each n, we get that for every jj. considered by the algorithm 

\d'\7i„,^,U)-d\Z,,U)\<5i/2. 

We therefore deduce that when the algorithm outputs a jj., (i''(;r„,^^,U) < £ + 5i/2, using the conditioning 



on the event in equation ( 17 1. Similarly, by noticing that the algorithm had d^{Zs,U) > e for n — d, we get 
that (i''(7rjv.^_5,U) > £ — 5i/2. The estimate of the running time follows by noticing that assuming bits with 
bias jj. can be sampled in 0(log(l//i)) time, it takes time 0{NLmaxaclog{l / jj.)) to simulate each step of 
the ^w chain. The bound on the error probability follows from the conditioning used on the validity of 
equation (flT). D 



We comment briefly on a technical point about the definition of the error threshold that has been used in 
the literature (and that we use too). With this definition, there might exist a /x satisfying 1/2 > /i > /!''(£) such 
that (i''(v,U) > £. An analogous condition might hold in the finite population case too. If we could preclude 
the occurrence of such anomalous behavior of error-thresholds, we would be able to improve the guarantee 
on the output jjLq of the algorithm ErrorThreshold to be of the form ijij!{e + 5i,N)<iXo< ix'j{e -di) + d. 
We observe, however, that somewhat surprisingly, even for the simpler case of the quasispecies model, to 
the best of our knowledge, no attempts have been made to rigorously prove that such anomalous behavior 
cannot occur. We leave the resolution of this point for the finite population case as an open problem. 
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